skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Guha, Sharmistha"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 1, 2026
  2. This article seeks to investigate the impact of aging on functional connectivity across different cognitive control scenarios, particularly emphasizing the identification of brain regions significantly associated with early aging. By conceptualizing functional connectivity within each cognitive control scenario as a graph, with brain regions as nodes, the statistical challenge revolves around devising a regression framework to predict a binary scalar outcome (aging or normal) using multiple graph predictors. Popular regression methods utilizing multiplex graph predictors often face limitations in effectively harnessing information within and across graph layers, leading to potentially less accurate inference and predictive accuracy, especially for smaller sample sizes. To address this challenge, we propose the Bayesian Multiplex Graph Classifier (BMGC). Accounting for multiplex graph topology, our method models edge coefficients at each graph layer using bilinear interactions between the latent effects associated with the two nodes connected by the edge. This approach also employs a variable selection framework on node-specific latent effects from all graph layers to identify influential nodes linked to observed outcomes. Crucially, the proposed framework is computationally efficient and quantifies the uncertainty in node identification, coefficient estimation, and binary outcome prediction. BMGC outperforms alternative methods in terms of the aforementioned metrics in simulation studies. An additional BMGC validation was completed using an fMRI study of brain networks in adults. The proposed BMGC technique identified that sensory motor brain network obeys certain lateral symmetries, whereas the default mode network exhibits significant brain asymmetries associated with early aging. 
    more » « less
  3. We consider causal inference for observational studies with data spread over two files. One file includes the treatment, outcome, and some covariates measured on a set of individuals, and the other file includes additional causally-relevant covariates measured on a partially overlapping set of individuals. By linking records in the two databases, the analyst can control for more covariates, thereby reducing the risk of bias compared to using only one file alone. When analysts do not have access to a unique identifier that enables perfect, error-free linkages, they typically rely on probabilistic record linkage to construct a single linked data set, and estimate causal effects using these linked data. This typical practice does not propagate uncertainty from imperfect linkages to the causal inferences. Further, it does not take advantage of relationships among the variables to improve the linkage quality. We address these shortcomings by fusing regression-assisted, Bayesian probabilistic record linkage with causal inference. The Markov chain Monte Carlo sampler generates multiple plausible linked data files as byproducts that analysts can use for multiple imputation inferences. Here, we show results for two causal estimators based on propensity score overlap weights. Using simulations and data from the Italy Survey on Household Income and Wealth, we show that our approach can improve the accuracy of estimated treatment effects. 
    more » « less
  4. There is a profound need to identify modifiable risk factors to screen and prevent pancreatic cancer. Air pollution, including fine particulate matter (PM2.5), is increasingly recognized as a risk factor for cancer. We conducted a case-control study using data from the electronic health record (EHR) of Duke University Health System, 15-year residential history, NASA satellite fine particulate matter (PM2.5), and neighborhood socioeconomic data. Using deterministic and probabilistic linkage algorithms, we linked residential history and EHR data to quantify long-term PM2.5 exposure. Logistic regression models quantified the association between a 1 interquartile range (IQR) increase in PM2.5 concentration and pancreatic cancer risk. The study included 203 cases and 5027 controls (median age of 59 years, 62% female, 26% Black). Individuals with pancreatic cancer had higher average annual exposure (9.4 μg/m3) as compared to an IQR increase in average annual PM2.5, which was associated with greater odds of pancreatic cancer (odds ratio = 1.20; 95% CI, 1.00-1.44). These findings highlight the link between elevated PM2.5 exposure and increased pancreatic cancer risk. They may inform screening strategies for high-risk populations and guide air pollution policies to mitigate exposure. This article is part of a Special Collection on Environmental Epidemiology. 
    more » « less
  5. Motivated by brain connectome datasets acquired using diffusion weighted magnetic resonance imaging (DWI), this article proposes a novel generalized Bayesian linear modeling framework with a symmetric tensor response and scalar predictors. The symmetric tensor coefficients corresponding to the scalar predictors are embedded with two features: low-rankness and group sparsity within the low-rank structure. Besides offering computational efficiency and parsimony, these two features enable identification of important “tensor nodes” and “tensor cells” significantly associated with the predictors, with characterization of uncertainty. The proposed framework is empirically investigated under various simulation settings and with a real brain connectome dataset. Theoretically, we establish that the posterior predictive density from the proposed model is “close” to the true data generating density, the closeness being measured by the Hellinger distance between these two densities, which scales at a rate very close to the finite dimensional optimal rate, depending on how the number of tensor nodes grow with the sample size. 
    more » « less